Robust Contextual Bandit via the Capped-$\ell_{2}$ norm
نویسندگان
چکیده
This paper considers the actor-critic contextual bandit for the mobile health (mHealth) intervention. The state-of-the-art decisionmaking methods in mHealth generally assume that the noise in the dynamic system follows the Gaussian distribution. Those methods use the least-square-based algorithm to estimate the expected reward, which is prone to the existence of outliers. To deal with the issue of outliers, we propose a novel robust actor-critic contextual bandit method for the mHealth intervention. In the critic updating, the capped-`2 norm is used to measure the approximation error, which prevents outliers from dominating our objective. A set of weights could be achieved from the critic updating. Considering them gives a weighted objective for the actor updating. It provides the badly noised sample in the critic updating with zero weights for the actor updating. As a result, the robustness of both actor-critic updating is enhanced. There is a key parameter in the capped-`2 norm. We provide a reliable method to properly set it by making use of one of the most fundamental definitions of outliers in statistics. Extensive experiment results demonstrate that our method can achieve almost identical results compared with the state-of-the-art methods on the dataset without outliers and dramatically outperform them on the datasets noised by outliers.
منابع مشابه
Robust Actor-Critic Contextual Bandit for Mobile Health (mHealth) Interventions
We consider the actor-critic contextual bandit for the mobile health (mHealth) intervention. State-of-the-art decisionmaking algorithms generally ignore the outliers in the dataset. In this paper, we propose a novel robust contextual bandit method for the mHealth. It can achieve the conflicting goal of reducing the influence of outliers, while seeking for a similar solution compared with the st...
متن کاملFaster algorithms for SVP and CVP in the $\ell_{\infty}$ norm
Blomer and Naewe[BN09] modified the randomized sieving algorithm of Ajtai, Kumar and Sivakumar[AKS01] to solve the shortest vector problem (SVP). The algorithm starts with $N = 2^{O(n)}$ randomly chosen vectors in the lattice and employs a sieving procedure to iteratively obtain shorter vectors in the lattice. The running time of the sieving procedure is quadratic in $N$. We study this problem ...
متن کاملSemiparametric Contextual Bandits
This paper studies semiparametric contextual bandits, a generalization of the linear stochastic bandit problem where the reward for an action is modeled as a linear function of known action features confounded by an non-linear action-independent term. We design new algorithms that achieve Õ(d √ T ) regret over T rounds, when the linear function is d-dimensional, which matches the best known bou...
متن کاملAdaptive Representation Selection in Contextual Bandit with Unlabeled History
We consider an extension of the contextual bandit setting, motivated by several practical applications, where an unlabeled history of contexts can become available for pre-training before the online decisionmaking begins. We propose an approach for improving the performance of contextual bandit in such setting, via adaptive, dynamic representation learning, which combines offline pre-training o...
متن کاملJoint Capped Norms Minimization for Robust Matrix Recovery
The low-rank matrix recovery is an important machine learning research topic with various scientific applications. Most existing low-rank matrix recovery methods relax the rank minimization problem via the trace norm minimization. However, such a relaxation makes the solution seriously deviate from the original one. Meanwhile, most matrix recovery methods minimize the squared prediction errors ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1708.05446 شماره
صفحات -
تاریخ انتشار 2017